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Abstract 

Binary  measurements  arise  naturally  in  a  variety  of  statistical  and  engineering  applications. 

They  may  be  inherent  to  the  problem — e.g.,  in  determining  the  relationship  between  genetics 
and  the  presence  or  absence  of  a  disease — or  they  may  be  a  result  of  extreme  quantization.  A 
recent  influx  of  literature  has  suggested  that  using  prior  signal  information  can  greatly  improve 
the  ability  to  reconstruct  a  signal  from  binary  measurements.  This  is  exemplified  by  one- 
bit  compressed  sensing ,  which  takes  the  compressed  sensing  model  but  assumes  that  only  the 
sign  of  each  measurement  is  retained.  It  has  recently  been  shown  that  the  number  of  one-bit 
measurements  required  for  signal  estimation  mirrors  that  of  unquantized  compressed  sensing. 
Indeed,  s-sparse  signals  in  R"  can  be  estimated  (up  to  normalization)  from  P(slog(n/s))  one-bit 
measurements.  Nevertheless,  controlling  the  precise  accuracy  of  the  error  estimate  remains  an 
open  challenge.  In  this  paper,  we  focus  on  optimizing  the  decay  of  the  error  as  a  function  of  the 
oversampling  factor  A  :=  ?n/(slog(n/s)),  where  m  is  the  number  of  measurements.  It  is  known 
that  the  error  in  reconstructing  sparse  signals  from  standard  one-bit  measurements  is  bounded 
below  by  f^A^1).  Without  adjusting  the  measurement  procedure,  reducing  this  polynomial 
error  decay  rate  is  impossible.  However,  we  show  that  an  adaptive  choice  of  the  thresholds 
used  for  quantization  may  lower  the  error  rate  to  e-SFU_  This  improves  upon  guarantees  for 
other  methods  of  adaptive  thresholding  as  proposed  in  Sigma-Delta  quantization.  We  develop 
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a  general  recursive  strategy  to  achieve  this  exponential  decay  and  two  specific  polynomial¬ 
time  algorithms  which  fall  into  this  framework,  one  based  on  convex  programming  and  one  on 
hard  thresholding.  This  work  is  inspired  by  the  one-bit  compressed  sensing  model,  in  which  the 
engineer  controls  the  measurement  procedure.  Nevertheless,  the  principle  is  extendable  to  signal 
reconstruction  problems  in  a  variety  of  binary  statistical  models  as  well  as  statistical  estimation 
problems  like  logistic  regression. 

Keywords,  compressed  sensing,  quantization,  one-bit  compressed  sensing,  convex  optimiza¬ 
tion,  iterative  thresholding,  binary  regression 

1  Introduction 

Many  practical  acquisition  devices  in  signal  processing  and  algorithms  in  machine  learning  use  a 
small  number  of  linear  measurements  to  represent  a  high-dimensional  signal.  Compressed  sensing 
is  a  technology  which  takes  advantage  of  the  fact  that,  for  some  interesting  classes  of  signals,  one 
can  use  far  fewer  measurements  than  dictated  by  traditional  Nyquist  sampling  paradigm.  In  this 
setting,  one  obtains  m  linear  measurements  of  a  signal  x  E  Mn  of  the  form 

Hi  =  (a>i,  x) ,  i  =  1, . . .  ,m. 

Written  concisely,  one  obtains  the  measurement  vector  y  =  Ax,  where  A  E  Mmxn  is  the  matrix 
with  rows  a±, ... ,  am.  From  these  (or  even  from  corrupted  measurements  y  =  A*  +  e),  one  wishes 
to  recover  the  signal  x.  To  make  this  problem  well-posed,  one  must  exploit  a  priori  information  on 
the  signal  x,  for  example  that  it  is  s-sparse,  i.e. , 

|| aj ||o  =  |supp(a:)|  =  s  <C  n, 

or  is  well-approximated  by  an  s-sparse  signal.  After  a  great  deal  of  research  activity  in  the  past 
decade  (see  the  website  or  the  references  in  the  monographs  }EK121 1FR13I ) ,  it  is  now  well 

known  that  when  A  consists  of,  say,  independent  standard  normal  entries,  one  can,  with  high 
probability,  recover  all  s-sparse  vectors  x  from  the  m  ~  s log(n/s)  linear  measurements  yi  =  (a,i,x), 
i  =  1, . . . ,  m. 

However,  in  practice,  the  compressive  measurements  ( ai,x )  must  be  quantized:  one  actually 
observes  y  =  Q(Ax),  where  the  map  Q  :  Mm  — >  Am  is  a  quantizer  that  acts  entrywise  by  mapping 
each  real- valued  measurement  to  a  discrete  quantization  alphabet  A.  This  type  of  quantization  with 
an  alphabet  A  consisting  of  only  two  elements  was  introduced  in  the  compressed  sensing  setting  by 
jBB08|  and  dubbed  one-bit  compressed  sensing  .  In  this  work,  we  focus  on  this  one-bit  approach 
and  seek  quantization  schemes  Q  and  reconstruction  algorithms  A  so  that  x  =  A(Q(Ax))  is  a 
good  approximation  to  x.  In  particular,  we  are  interested  in  the  trade-off  between  the  error  of  the 
approximation  and  the  oversampling  factor 

^  def 

s  log (n/s) 

1.1  Motivation  and  previous  work 

The  most  natural  quantization  method  is  Memoryless  Scalar  Quantization  (MSQ),  where  each 
entry  of  y  =  Ax  is  rounded  to  the  nearest  element  of  some  quantization  alphabet  A.  If  A  =  5h  for 
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Figure  1:  Geometric  interpretation  of  one-bit  compressed  sensing.  Each  quantized  measurement 
reveals  which  side  of  a  hyperplane  (or  great  circle,  when  restricted  to  the  sphere)  the  signal  x 
lies  on.  After  several  measurements,  we  know  that  x  lies  in  one  unique  region.  However,  if  the 
measurements  are  non-adaptive,  then  as  the  region  of  interest  becomes  smaller,  it  becomes  less  and 
less  likely  that  the  next  measurement  yields  any  new  information  about  x. 


some  suitably  small  5  >  0,  then  this  rounding  error  can  be  modeled  as  an  additive  measurement 
error  |DPM09] ,  and  the  recovery  algorithm  can  be  fine-tuned  to  this  particular  situation  .11  IF  I  1  .  In 
the  one-bit  case,  however,  the  quantization  alphabet  is  A  =  {±1}  and  the  quantized  measurements 
take  the  form  y  =  sign(Acc),  meaning  that  sigr^acts  entrywise  as 

Vi  =  QMSQ({o-i,x))  =  sign((cq,  x)),  i  = 

One-bit  compressed  sensing  was  introduced  in  |BB08|.  and  it  has  generated  a  considerable  amount 
of  work  since  then,  see  jDSPj  for  a  growing  list  of  literature  in  this  area.  Several  efficient  recovery 
algorithms  have  been  proposed,  based  on  linear  programming  |PV13al  IPV13bl  G.X.IX  13j  and  on 
modifications  of  iterative  hard  thresholding  [JLBB131 I.JDDV13) .  As  shown  in  (JLBB13j ,  with  high 
probability  one  can  perform  the  reconstruction  from  one-bit  measurements  with  error 

11*  —  £c||2  t"  for  all  x  6  :=  {v  6  :  ||v||o  <  s,  ||v||2  =  1}- 

A 

In  other  words,  a  uniform  ^-reconstruction  error  of  at  most  7  >  0  can  be  achieved  with  m  X 
7_1slog(n/s)  one-bit  measurements. 

Despite  the  dimension  reduction  from  n  to  slog(n/s),  MSQ  presents  substantial  limita¬ 
tions  }JLBB13l  1GVT98] .  Precisely,  according  to  [GVT98|.  even  if  the  support  of  x  e  £(  is  known, 
the  best  recovery  algorithm  Aopt  must  obey 

||®  -  Aopt(QMSQ(Aaj))||2  >  -  (1) 

up  to  a  logarithmic  factor.  An  intuition  for  the  limited  accuracy  of  MSQ  is  given  in  Figure  [lj 
Alternative  quantization  schemes  have  been  developed  to  overcome  this  drawback.  For  a  specific 
signal  model  and  reconstruction  algorithm,  |SG09j  obtained  the  optimal  quantization  scheme,  but 
more  general  quantization  schemes  remain  open. 

Recently,  Sigma-Delta  quantization  schemes  have  also  been  proposed  as  a  more  general  quan¬ 
tization  model  |GLP+  lOl  iKSY  14| .  These  works  show  that,  with  high  probability  on  measurement 

1We  define  sign(0)  =  1. 
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matrices  with  independent  subgaussian  entries,  r-th  order  Sigma-Delta  quantization  can  be  applied 
to  the  standard  compressed  sensing  problem  to  achieve,  for  any  a  6  (0, 1),  the  reconstruction  error 


\x  —  x\ 


<  1/2) 


(2) 


with  a  number  of  measurements 

m  «  s  (log(n/s))1^1^a'1  . 

For  suitable  choices  of  a  and  r,  the  guarantee  ([2])  overcomes  the  limitation  Q,  but  it  is  still 
polynomial  in  A.  This  leads  us  to  ask  whether  an  exponential  dependence  can  be  achieved. 


1.2  Our  contributions 

In  this  work,  we  focus  on  improving  the  trade-off  between  the  error  \\x  —  ®||2  and  the  oversampling 
factor  A.  To  the  best  of  our  knowledge,  all  quantized  compressed  sensing  schemes  obtain  guarantees 
of  the  form 

||®  —  ®||2  <  A-c  for  all  x  e  X/s  (3) 

with  some  constant  c  >  0.  We  develop  one-bit  quantizers  Q  :  Mm  -»  {±1},  coupled  with  two 
efficient  recovery  algorithms  A  :  {±1}  — >  Mm  that  yield  the  reconstruction  guarantee 

||®  —  A(Q(A®))||2  <  exp(— 12(A))  for  all  x  e  (4) 

It  is  not  hard  to  see  that  the  dependence  on  A  in  Q  is  optimal,  since  any  method  of  quantiz¬ 
ing  measurements  that  provides  the  reconstruction  guarantee  ||®  —  ®||2  <  7  must  use  at  least 
log2  Af(T,'s,  7)  >  slog2(l/7)  bits,  where  A /"(•)  denotes  the  covering  number. 


1.2.1  Adaptive  measurement  model 

A  key  element  of  our  approach  is  that  the  quantizers  are  adaptive  to  previous  measurements  of  the 
signal  in  a  manner  similar  to  Sigma-Delta  quantization  |GLP+10).  In  particular,  the  measurement 
matrix  A  6  Wnxn  is  assumed  to  have  independent  standard  normal  entries  and  the  quantized 
measurements  take  the  form  of  thresholded  signs,  i.e. , 


Vi  =  sign ((ai,®)  -  Ti) 


1  if  ( ai,x)>Ti , 
-1  if  ( ai,x)<Ti . 


(5) 


Such  measurements  are  readily  implementable  in  hardware,  and  they  retain  the  simplicity  and 
storage  benefits  of  the  one-bit  compressed  sensing  model.  However,  as  we  will  show,  this  model 
is  much  more  powerful  in  the  sense  that  it  permits  optimal  guarantees  of  the  form  Q,  which  are 
impossible  with  standard  MSQ  one-bit  quantization.  As  in  the  Sigma-Delta  quantization  approach, 
we  allow  the  quantizer  to  be  adaptive,  meaning  that  the  quantization  threshold  r,;  of  the  ith  entry 
may  depend  on  the  1st,  2nd,  . . .,  (i  —  l)st  quantized  measurements.  In  the  context  of  ([5]),  this 
means  that  the  thresholds  r*  will  be  chosen  adaptively,  resulting  in  a  feedback  loop  as  depicted  in 
Figure  [2j  The  thresholds  r,  can  also  be  interpreted  as  an  additive  dither ,  which  is  oft-used  in  the 
theory  and  practice  of  analog-to-digital  conversion. 

In  contrast  to  Sigma-Delta  quantization,  the  feedback  loop  involves  the  calculation  of  the  quan¬ 
tization  threshold.  This  is  the  concession  made  to  arrive  at  exponentially  decaying  error  rates.  It  is 
an  interesting  open  problem  to  determine  low-memory  quantization  methods  with  such  error  rates 
that  do  not  require  such  a  calculation. 
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Figure  2:  Our  closed-loop  feedback  system  for  binary  measurements. 


1.2.2  Overview  of  our  main  result 

Our  main  result  is  that  there  is  a  recovery  algorithm  using  measurements  of  the  form  ([5])  and 
providing  a  guarantee  of  the  form  Q.  For  clarity  of  exposition,  we  overview  a  simplified  version 
of  our  main  result  below.  The  full  result  is  stated  in  Section  [H 

Theorem  1  (Main  theorem,  simplified  version).  Let  Q  and  A  be  the  quantization  and  recovery 
algorithms  given  below  in  Algorithms  [l]  and[^[  respectively.  Suppose  that  A  E  Mmxn  and  r  E  Mm 
have  independent  standard  normal  entries.  Then,  with  probability  at  least  CAexp(  — cslog(n/s)) 
over  the  choice  of  A  and  t,  for  all  x  E  Blf  with  ||x||o  <  s, 

\\x  —  A(Q(Ax,  A,  r))||2  <  exp(— 12(A)),  where  A  = 

The  quantization  algorithm  works  iteratively.  First,  a  small  batch  of  measurements  are  quan¬ 
tized  in  a  memoryless  fashion.  From  this  first  batch,  one  gains  a  very  rough  estimate  of  x  (called 
x\).  The  next  batch  of  measurements  are  quantized  with  a  focus  on  encoding  the  difference  be¬ 
tween  x  and  ®i,  and  so  on.  Thus,  the  trap  depicted  in  Figure  [T]  is  avoided;  each  hyperplane  is 
translated  with  an  appropriate  dither,  with  the  aim  of  cutting  the  size  of  the  feasible  region.  The 
recovery  algorithm  also  works  iteratively  and  its  iterates  are  in  fact  intertwined  with  the  iterates 
of  the  quantization  algorithm.  We  artificially  separate  the  two  algorithms  below. 

Note  that  we  present  Algorithms  [I]  and  [2]  at  this  point  mainly  because  they  are  the  simplest  to 
state.  Below  we  will  provide  a  more  general  framework  for  algorithms  that  satisfy  the  guarantees 
of  Theorem  [l]  and  develop  a  second  set  of  algorithms  with  computational  advantages. 


m 

s  log (n/s) 


1.2.3  Robustness 

Our  algorithms  are  robust  to  two  different  kinds  of  measurement  corruption.  First,  they  allow  for 
perturbed  linear  measurements  of  the  form  ( di,x )  +  e*  for  an  error  vector  e  E  Mm  with  bounded 
^oo-norm.  Second  they  allow  for  post-quantization  sign  flips,  recorded  as  a  vector  f  E  {±l}m. 
Formally,  the  measurements  take  the  form 

Vi  =  fi  sign  ((a*,  x)  -Ti  +  ef),  i  =  l,...,m.  (6) 

It  is  known  that  for  inaccurate  measurements  with  pre-quantization  noise  on  the  same  order  of  mag¬ 
nitude  as  the  signal,  even  unquantized  compressed  sensing  algorithms  must  obey  a  lower  bound  of 
the  form  ([I])  [CD13].  Our  algorithms  respect  this  reality  and  exhibit  exponentially  fast  convergence 
until  the  estimate  hits  the  “noise  floor” — that  is,  until  the  error  \\x  —  x||2  is  on  the  order  of  HeHoo. 

Table  [l]  summarizes  the  various  noise  models,  adaptive  threshold  calculations,  and  algorithms 
we  develop  and  study  below. 
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Algorithm  1:  Adaptive  quantization 

Input:  Linear  measurements  Ax  E  Rm;  measurement  matrix  A  E  Rmxn;  sparsity 

parameter  s;  thresholds  r  E  Rm;  parameter  q  >  Cs\og{n/s)  for  the  size  of  batches. 
Output:  Quantized  measurements  y  E  {±l}m. 

p  ^ _  m 

L q  _ 

Partition  A  and  r  into  T  blocks  A^, . . . ,  A E  R9Xn  and  r^1), . . . . r E  R9. 

3?0  ^ —  0 

for  t  =  1, . . . ,  T  do 

4—  sign(A^a:  —  22“M^  —  a^) 

zt  4—  argmin||z||i  subject  to  |z||2  <  22'*,  y^  ^af\z^  —  2 >  0  for  all  i 
//  Zt  is  an  approximation  of  x  —  Xt_ i 

Xt  4—  Hs(xt- 1  +  Zt)  //  iT,  keeps  s  largest  (in  magnitude)  entries  and  zeroes 
out  the  rest 


return  yP  for  t  =  1, . . . ,  T/l  Notice  that  we  discard 


Algorithm  2:  Recovery 

Input:  Quantized  measurements  y  E  {=hl}m;  measurement  matrix  A;  sparsity  parameter  s; 


thresholds  t  E  Rm;  size  of  batches  q. 

Output:  Approximation  x  E  Rn. 

p  ^ _  m 

L q  _ 

Partition  A  and  r  into  T  blocks  A^\ . . . ,  A E  R9Xn  and  r^1), . . . . r E  R9. 
xo  4—  0 

for  t  =  1, . . . ,  T  do 

zt  4—  argmin||z||i  subject  to  [|z||2  <  22-*,  y^  ^af\z^  —  2 >  0  for  all  i 
xt  =  Hs(xt_  i  +  zt) 

return  Xt 

1.2.4  Relationship  to  binary  regression 

Our  one-bit  adaptive  quantization  and  reconstruction  algorithms  are  more  broadly  applicable  to  a 
certain  kind  of  statistical  classification  problem  related  to  sparse  binary  regression,  and  in  particular 
sparse  logistic  and  probit  regression.  These  techniques  are  often  used  to  explain  statistical  data  in 
which  the  response  variable  is  binary.  In  regression,  it  is  common  to  assume  that  the  data  {zt} 
is  generated  according  to  the  generalized  linear  model ,  where  z*  E  {0, 1}  is  a  Bernoulli  random 
variable  satisfying 

(7) 

for  some  function  /  :  R  — >•  [0, 1].  The  generalized  linear  model  is  equivalent  to  the  noisy  one-bit 
compressed  sensing  model  when  the  measurements  yt  =  2 Zj  —  1  E  {±1}  and 

PiVi  =  1)  =:  f({ai,x)), 
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Table  1:  Summary  of  the  noise  models,  adaptive  threshold  calculations,  and  algorithms  considered. 
See  Section  [2]  for  further  discussion  of  the  trade-offs  between  the  two  algorithms. 


Noise  model 

Threshold  algorithm 

Recovery  algorithm 

Additive  error  e*  in  ([6]) 

Algorithm  [tJ  instantiated  by 
Algorithm  [3] 

Convex  programming: 
Algorithm  [8j  instantiated  by 
Algorithm  [4] 

Additive  error  ej  and  sign 
flips  fi  in  ((e]) 

Algorithm  [7J  instantiated  by 
Algorithm  [5] 

Iterative  hard  thresholding: 
Algorithm  [8}  instantiated  by 
Algorithm  [6] 

or  equivalently,  when 


Vi  =  sign  ((ai,x)  +  ej) 


with  /(f)  :=  P(e,  >  —  t).  In  summary,  one-bit  compressed  sensing  is  equivalent  to  binary  regression 
as  long  as  /  is  the  cumulative  distribution  function  (CDF)  of  the  noise  variable  e*.  The  most 
commonly  used  CDFs  in  binary  regression  are  the  inverse  logistic  link  function  /(t)  = 


m 


logistic  regression  and  the  inverse  probit  link  function  f(t)  =  f_  Af(s)ds  in  probit  regression. 
These  cases  correspond  to  the  noise  variable  e*  being  logistic  and  Gaussian  distributed,  respectively. 


The  new  twist  here  is  that  the  quantization  thresholds  are  selected  adaptively;  see  Section  6.1 
for  some  examples.  Specifically,  our  adaptive  threshold  measurement  model  is  equivalent  to  the 
adaptive  binary  regression  model 


Ui  =  sign((aj,  x)  +  -  n) 


with 

P(yi  =  1)  =  P(ei  -  n  >=  -t)  =  f{t  -  Tj). 

The  effect  of  Tj  in  this  adaptive  binary  regression  is  equivalent  to  an  offset  term  added  to  all 
measurements  yl.  Standard  binary  regression  corresponds  to  the  special  case  with  Tj  =  0. 

1.3  Organization 

In  Section  [2j  we  introduce  two  methods  to  recover  not  only  the  direction,  but  also  the  magnitude, 
of  a  signal  from  one-bit  compressed  sensing  measurements  of  the  form  Q.  These  methods  may  be 
of  independent  interest  (in  one-bit  compressed  sensing,  only  the  direction  can  be  recovered),  but 
they  do  not  exhibit  the  exponential  decay  in  the  error  that  we  seek.  In  Section  [3j  we  will  show  how 
to  use  these  schemes  as  building  blocks  to  obtain  Q.  The  proofs  of  all  of  our  results  are  given  in 
Section  [4j  In  Section  [5j  we  present  some  numerical  results  for  the  new  algorithms.  We  conclude  in 
Section  [6]  with  a  brief  summary. 

1.4  Notation 

Throughout  the  paper,  we  use  the  standard  notation  ||v||2  =  \jYli  V1  f°r  the  f^-norm  of  a  vector 
v  E  Mn,  ||u||i  =  |uj|  for  its  ti-norm,  and  ||v||o  for  its  number  of  nonzero  entries.  A  vector  v  is 


7 


called  s-sparse  if  ||v||o  <  s  and  effectively  s-sparse  if  ||n||i  <  i/sIMh-  We  write  Hs(v )  to  represent 
the  vector  in  W1  agreeing  with  v  on  the  index  set  of  largest  s  entries  of  v  (in  magnitude)  and  with 
the  zero  vector  elsewhere.  We  use  a  prime  to  indicate  ^-normalization,  so  that  H's{v )  is  defined  as 
H's(v)  :=  Hs(v)/  ||ifs(u)||2.  The  set  £s  :=  {«  £  I”  :  ||u||o  <  s}  of  s-sparse  vectors  is  accompanied 
by  the  set  £(,  :=  {u  £  K“  :  IMIo  <  s,  ||v||2  =  1}  of  ^-normalized  s-sparse  vectors.  For  R  >  0,  we 
write  to  mean  the  set  {u  E  Mn  :  ||u||o  <  s,  ||v||2  =  R}-  We  also  write  F>2  =  {v  E  Mn  :  \\v  lb  <  1} 
for  the  ^-ball  in  W1  and  RB%  for  the  appropriately  scaled  version.  We  consider  the  task  of 
recovering  x  £  T,s  from  measurements  of  the  form  ([5])  or  ©>  for  i  =  1, . . . ,  m.  These  measurements 
are  organized  as  a  matrix  A  E  Mmxn  with  rows  a\, . . .  ,am  and  a  vector  r  E  Mm  of  thresholds. 
Matching  the  Sigma-Delta  quantization  model,  the  a*  E  Mn  may  be  random  but  are  non-adaptive, 
while  the  r;  £  R  may  be  chosen  adaptively,  in  either  a  random  or  deterministic  fashion.  The 
Hamming  distance  between  sign  vectors  y,  y  E  {±l}m  is  defined  as  dn{y,  y)  =  Yli  ^{yi^yi}- 

2  Magnitude  recovery 

Given  an  s-sparse  vector  x  E  Mn,  several  convex  programs  are  provably  able  to  extract  an  accurate 
estimate  of  the  direction  of  x  from  sign(Aai)  or  sign(Acc  +  e)  |PV13bliPV13aj.  However,  recovery  of 
the  magnitude  of  x  is  challenging  in  this  setting  |KSW14],  Indeed,  all  magnitude  information  about 
x  is  lost  in  measurements  of  the  form  sign(Acc).  Fortunately,  if  random  (non-adaptive)  dither  is 
added  before  quantization,  then  magnitude  recovery  becomes  possible,  i.e. ,  noise  can  actually  help 
with  signal  reconstruction.  This  observation  has  also  been  made  in  the  concurrently  written  paper 
jKSW14]  and  also  in  the  literature  on  binary  regression  in  statistics  (DP  v  dBW14j. 

Our  main  result  will  show  that  both  the  magnitude  and  direction  of  x  can  be  estimated  with 
exponentially  small  error  bounds.  In  this  section,  we  first  lay  the  groundwork  for  our  main  result 
by  developing  two  methods  for  one-bit  signal  acquisition  and  reconstruction  that  provide  accurate 
reconstruction  of  both  the  magnitude  and  direction  of  x  with  polynomially  decaying  error  bounds. 

We  propose  two  different  order-one  recovery  schemes.  The  first  is  based  on  second-order  cone 
programming  and  is  simpler  but  more  computationally  intensive.  The  second  is  based  on  hard 
thresholding,  is  faster,  and  is  able  to  handle  a  more  general  noise  model  (in  particular,  random  sign 
flips  of  the  measurements)  but  requires  an  adaptive  dither.  Recall  Table  [I] 

2.1  Second-order  cone  programming 

The  size  of  the  appropriate  dither /threshold  depends  on  the  magnitude  of  x.  Thus,  let  R  >  0 
satisfy  ||x||2  <  R.  We  take  measurements  of  the  form 

Vi  =  sign((aj,x)  -  t;  +  a),  i  =  l,...,q,  (8) 

where  T\. ...  ,rq  ~  N(0,4R2)  are  known  independent  normally  distributed  dithers  that  are  also 
independent  of  the  rows  ai, ...  ,aq  of  the  matrix  A  and  e±, . . .  ,eq  are  small  deterministic  errors 
(possibly  adversarial)  satisfying  |  e*  |  <  cR  for  an  absolute  constant  c.  The  following  second-order 
cone  program 

argmin||z||i  subject  to  ||z||2  <  2R,  yi({a,i,  z)  —  Ti)  >  0  for  all  i  =  1. ...  .q  (9) 
provides  a  good  estimate  of  x,  as  formally  stated  below. 


Algorithm  3:  To:  Threshold  production  for  second-order  cone  programming 
Input:  Bound  R  on  ||*||2 
Output:  Thresholds  tGP 
return  r  ~  iV(0,  R2Iq) 


Algorithm  4:  Ao:  Recovery  procedure  for  second-order  cone  programming 
Input:  Quantized  measurements  y  £  {±1}9;  measurement  matrix  A  £  M9><n;  bound  R  on 
||®||2;  thresholds  r  £  MT 
Output:  Approximation  x 

return  argmin  ||z||i  subject  to  ||z||2  <  2R,  yi((a,i,  z)  —  rQ  >  0  for  all  i  =  1, . . . ,  q. 


Theorem  2.  Let  1  >  6  >  0,  let  A  £  W]Xn  have  independent  standard  normal  entries,  and  let 
T\, ,  Tq  £  M  be  independent  normal  variables  with  variance  4 R2 .  Suppose  that  n  >  2q  and 

q  >  C'd~4s\og(n/s). 

Then,  with  probability  at  least  1  —  3exp(— co54<7)  over  the  choice  of  A  and  the  dithers  T\, ...  ,rq, 
the  following  holds  for  all  x  £  RB2  ^  an d  e  £  M9  satisfying  || e <  cd3R:  for  y  obeying  the 
measurement  model  ([8]),  the  solution  x  to  ©  satisfies 

II*  —  ®||2  <  SR. 

The  positive  constants  C' ,  c  and  Co  above  are  absolute  constants. 

Remark  1.  The  choice  of  the  constraint  \\z\\2  <  2 R  and  the  variance  4 R2  for  the  Ti ’s  allows  for 
the  above  theoretical  guarantees  in  the  presence  of  pre- quantization  error  e  /  0.  However,  in  the 
ideal  case  e  =  0,  the  guarantees  also  hold  if  we  impose  ||z||2  <  R  and  take  a  variance  of  R2 .  This 
more  natural  choice  seems  to  give  better  results  in  practice,  even  in  the  presence  of  pre- quantization 
error  (as  R  was  already  an  overestimation  for  ||*||2J.  This  is  the  route  followed  in  the  numerical 
experiments  of  Section^  R  only  requires  changing  22-<  to  21~t  in  Algorithms [7]  and[l| 

To  fit  into  our  general  framework  for  exponential  error  decay,  it  is  helpful  to  think  of  the  program 
©  as  two  separate  algorithms:  an  algorithm  To  that  produces  thresholds  and  an  algorithm  Ao  that 
performs  the  recovery.  These  are  formally  described  in  Algorithms  [3]  and  [4j 

2.2  Hard  thresholding 

The  convex  programming  approach  is  attractive  in  many  respects;  in  particular,  the  thresh¬ 
olds/dithers  Ti  are  non-adaptive,  which  makes  them  especially  easy  to  apply  in  hardware.  However, 
the  recovery  algorithm  Ao  in  Algorithm  [4]  can  be  costly.  Further,  while  the  convex  programming 
approach  can  handle  additive  pre-quantization  error,  it  cannot  necessarily  handle  post-quantization 
error  (sign  flips).  In  this  section,  we  present  an  alternative  scheme  for  estimating  magnitude,  based 
on  iterative  hard  thresholding  that  addresses  these  challenges.  The  only  downside  is  that  the 
thresholds/dithers  r*  become  adaptive  within  the  order-one  recovery  scheme. 

Given  an  s-sparse  vector  x  £  Mn,  one  can  easily  extract  from  sign(Aa;)  a  good  estimate  for  the 
direction  of  x.  For  example,  we  will  see  that  R((A*sign(A*))  is  a  good  approximation  of  ®/||®||2. 
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Algorithm  5:  To:  Threshold  production  for  hard  thresholding 
Input:  Measurements  Ax  G  M9;  measurement  matrix  A  G  R9Xri;  sparsity  parameter  s; 

bound  R  on  ||x||2. 

Output:  Thresholds  r  G  M9 

Partition  Ax  into  A\x,  A2X  G  M9/2. 

u  I/'(A*sign(Aix)) 

v  V (u) 

w  <—  2 R  ■  (u  +  v) 

return  0  G  M9/2,  A2W  G  M9,/2 


Algorithm  6:  Ao:  Recovery  procedure  for  hard  thresholding 
Input:  Quantized  measurements  y  G  {±1}9;  measurement  matrix  A  G  M9Xn;  sparsity 
parameter  s;  bound  R  on  ||x||2. 

Output:  Approximation  x 
Partition  y  into  y\,  y2  G  M9/2. 
u  H's(A\yi) 
v  ■<—  V  ( u ) 
t  i  H's(A*2y2) 

return  2 Rf((t,v))  ■  u,  where  /(£)  =  1  — 


However,  as  mentioned  earlier,  there  is  no  hope  of  recovering  the  magnitude  ||x||2  of  the  signal 
from  sign(Ax).  To  get  around  this,  we  use  a  second  estimator,  this  time  for  the  direction  of  x  —  z 
for  a  well-chosen  vector  z  G  Mn  obtained  by  computing  H's{ A*sign(A(x  —  z))).  This  allows  us  to 
estimate  both  the  direction  and  the  magnitude  of  x. 

As  above,  we  break  the  measurement /recovery  process  into  two  separate  algorithms.  The  first 
is  an  algorithm  To  describing  how  to  generate  the  thresholds  rt.  The  second  is  a  recovery  algorithm 
Ao  that  describes  how  to  recover  an  approximation  x  to  x  based  on  measurements  of  the  form  ([6]) , 
using  the  r*  as  thresholds.  These  are  formally  described  in  Algorithms  [5]  and  [6j  In  the  algorithm 
statements,  V  denotes  any  fixed  rule  associating  to  a  vector  u  an  ^-normalized  vector  V(u )  that 
is  both  orthogonal  to  u  and  has  the  same  support. 

The  analysis  for  Tq  and  Aq  relies  on  the  following  theorems. 


Theorem  3.  Let  1  >  5  >  0  and  let  A  G  M9Xn  have  independent  standard  normal  entries.  Suppose 
that  n  >  2q  and  q  >  c\5~ ‘ slog{n/ s) .  Then,  with  probability  at  least  1  —  C2  exp(— cz52q)  over  the 
choice  of  A,  the  following  holds  for  all  s-sparse  x  G  Mn,  all  e  G  M9  with  ||e||2  <  c§^fq  ||x||2,  and  all 
V  G  {±1}9: 


x 


\x\ 


-  K  (A *y) 


<  6  +  C4  I,  I,  +c51 

2  V^ll*ll2 


UH(y,  sign(Ax  +  e)) 
Q 


(10) 


The  positive  constants  ci,  C2,  C3,  C4,  C5,  and  cq  above  are  absolute  constants. 

The  proof  of  Theorem  [3]  is  given  in  Section  [4j  Once  Theorem  [3]  is  shown,  we  will  be  able  to 
establish  the  following  results  when  the  threshold  production  and  recovery  procedures  To  and  Ao 
are  given  by  Algorithms  [5]  and  [6} 
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Theorem  4.  Let  1  >  5  >  0,  let  A  E  Mqxn  have  independent  standard  normal  entries,  and  let  To 
and  Ao  he  as  in  Algorithms and[6|  Suppose  that  n  >  2q  and 

q  >  Ci5~7 s\og(n / s) . 

Further  assume  that  whenever  a  signal  z  is  measured,  the  corruption  errors  satisfy  ||e||oo  <  c<5| | ^ 1 1 2 
and  |{i  :  fi  =  — 1} |  <  c'5q.  Then,  with  probablity  at  least  1  —  C7 exp(— c%52q)  over  the  choice  of 
A,  the  following  holds  for  all  x  E  RB'f  n  :  for  y  obeying  the  measurement  model  ([6])  with 
t  =  Tq(Ax,  A ,  s,  R),  the  vector  x  =  Ao (y,  A ,  s,  R)  satisfies 

II*  —  x||2  <  5R. 

The  positive  constants  c\,  c,  c! ,  cj,  and  cs  above  are  absolute  constants. 

Having  proposed  two  methods  for  recovering  both  the  direction  and  magnitude  of  a  sparse 
vector  from  binary  measurements,  we  now  turn  to  our  main  result. 

3  Exponential  decay:  General  framework 

In  the  previous  section,  we  developed  two  methods  for  approximately  recovering  x  from  binary 
measurements.  Unfortunately,  these  methods  exhibit  polynomial  error  decay  in  the  oversampling 
factor,  and  our  goal  is  to  obtain  an  exponential  decay.  We  can  achieve  this  goal  by  applying  the 
rough  estimation  methods  iteratively,  in  batches,  with  adaptive  thresholds/dithers.  As  we  show 
below,  this  leads  to  an  extremely  accurate  recovery  scheme.  To  make  this  framework  precise,  we 
first  define  an  order-one  recovery  scheme  (To,  Ao). 

Definition  5  (Order-one  recovery  scheme).  An  order-one  recovery  scheme  with  sparsity  parameter 
s,  measurement  complexity  q,  and  noise  resilience  ( rj,b )  is  a  pair  of  algorithms  (To,  Ao)  such  that: 

•  The  thresholding  algorithm  To  takes  a  parameter  R  and,  optionally,  a  set  of  linear  mea¬ 
surements  Ax  E  M9  and  the  measurement  matrix  A  E  M9Xn.  It  outputs  a  set  of  thresholds 
t  E  M9. 

•  The  recovery  algorithm  Ao  takes  q  corrupted  quantized  measurements  of  the  form  ©.  *.e., 

Vi  =  fi  sign  ((a*,  x)  -  n  +  a), 

where  e  E  M9  is  a  pre- quantization  error  and  f  E  {±1}9  is  a  post- quantization  error.  It  also 
takes  as  input  the  measurement  matrix  A  E  Wxn,  a  parameter  R,  and,  optionally,  a  sparsity 
parameter  s  and  the  thresholds  r  returned  by  To.  It  outputs  a  vector  x  E  Mn. 

•  With  probability  at  least  1  —  Cexp(— cq)  over  the  choice  of  A  E  M9Xn  and  the  randomness 
of  To,  the  following  holds:  for  all  x  E  RBf  n  all  e  E  M9  with  Helloo  <  77 1 1 as 1 1 2 ,  and  all 
f  E  {±1}9  with  at  most  b  sign  flips,  the  estimate  x  =  Ao (y,  A,  R ,  s,  r)  satisfies 

ll  Nl  ^  R 
ll  112  -  4 


11 


Algorithm  7:  Q:  Quantization 


Input:  Linear  measurements  Ax  G  Mm;  measurement  matrix  A  G 
parameter  s ;  bound  R  on  1 1 a? || 2 ;  parameter  q  >  Cs\og{n/s ) 
Output:  Quantized  measurements  y  G  {±l}m  and  thresholds  r  G 


T  4- 


m 


lq  A 

Partition  A  into  T  blocks  A^1), . . . ,  A ^  G  M'?x 

Xq  4 —  0 

for  t  =  1, . . . ,  T  do 


m 


Rt  =  2~t+l 

rW  4-T0(A®  (x  -xt-x),  AW,Rt) 

(jR)  A^xt-i 

y(t)  f(b  q  sign(A^cc  —  —  <jC)  +  e^) 

_  xt  4-  Hs{xt_  1  +  A0(yW,  AW,Rt,  r«)) 
return  y^, for  t  =  1, . . . ,  T 


Mmxn;  sparsity 

for  the  size  of  batches. 


Algorithm  8:  A:  Recovery 

Input:  Quantized  measurements  y  G  {±l}m;  measurement  matrix  A  G  Mmxn;  sparsity 
parameter  s;  bound  R  on  1 1 a? || 2 ;  thresholds  r  G  Mm;  size  of  batches  q. 

Output:  Approximation  x  G  Mn 

rp  _  m 

Partition  A  into  T  blocks  AW, . . . ,  A1'1''1  G  M'?xm 

Xq  ^ —  0 

for  t  =  1, . . . ,  T  do 

<-  ff,(*t-i  +  A0(yW,  AW, R2~t+\r^))  (11) 

return  xt 


We  saw  two  examples  of  order-one  recovery  schemes  in  Section  [2j  The  scheme  based  on  second- 
order  cone  programming  is  an  order-one  recovery  scheme  with  sparsity  parameter  s,  measurement 
complexity  q  =  Coslog(n/s),  and  noise  resilience  77  =  cqR  and  6  =  0.  The  scheme  based  on 
iterated  hard  thresholding  is  an  order-one  recovery  scheme  with  sparsity  parameter  s,  measurement 
complexity  q  =  C\s  log (n/s),  and  noise  resilience  y  =  c\R  and  6  =  C2q ■  Above,  cq,  ci,  C2,  Co,  C 1  >  0 
are  absolute  constants. 

We  use  an  order-one  recovery  scheme  to  build  a  pair  of  one-bit  quantization  and  recovery  algo¬ 
rithms  for  sparse  vectors  that  exhibits  extremely  fast  convergence.  Our  quantization  and  recovery 
algorithms  Q  and  A  are  given  in  Algorithms  [7]  and  |8j  respectively.  They  are  in  reality  intertwined, 
but  again  we  separate  them  for  expositional  clarity. 

The  intuition  motivating  Step  (JTTj)  is  that  Ao {y^\  A^\  Rt,  t^\  )  estimates  x  —  xt-i]  hence  xt 
approximates  x  better  than  Xt-i  does.  Note  the  similarity  to  the  intuition  motivating  iterative 
hard  thresholding,  with  the  key  difference  being  that  the  quantization  is  also  performed  iteratively. 

Remark  2  (Computational  and  storage  considerations).  Let  us  analyze  the  storage  requirements 
and  computational  complexity  of  Q  and  A,  both  during  and  after  quantization. 
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We  begin  by  considering  the  approach  based  on  convex  programming.  In  this  case,  the  final 
storage  requirements  of  the  quantizer  Q  are  similar  to  those  in  standard  one-bit  compressed  sensing. 
The  “algorithm”  To  is  straightforward:  it  simply  draws  random  thresholds/ dithers.  In  particular, 
we  may  treat  these  thresholds  as  predetermined  independent  normal  random  variables  in  the  same 
way  as  we  treat  A.  If  A  and  r  are  generated  by  a  short  seed,  then  all  that  needs  to  be  stored  after 
quantization  are  the  binary  measurements  y  E  {il}9.  During  quantization,  the  algorithm  Q  needs 
to  store  Xt ■  However,  this  requires  small  memory  since  Xt  is  s-sparse. 

While  the  convex  programming  approach  is  designed  to  ease  storage  burdens,  the  order-one 
recovery  scheme  based  on  hard  thresholding  is  built  for  speed.  In  this  case,  the  threshold  algorithm 
To  ( Algorithm [5p  is  more  complicated,  and  the  adaptive  thresholds  r  need  to  be  stored.  On  the  other 
hand,  the  computation  of  xt  is  much  faster,  and  both  the  quantization  and  recovery  algorithms  are 
very  efficient. 

Given  an  order-one  recovery  scheme  (To,Ao),  the  quantizer  Q  given  in  Algorithm  [7]  and  the 
recovery  algorithm  A  given  in  Algorithm  [8]  have  the  desired  exponential  convergence  rate.  This  is 
formally  stated  in  the  theorem  below  and  proved  in  Section  [4} 

Theorem  6.  Let  (To,  Ao)  be  an  order-one  recovery  scheme  with  sparsity  parameter  2s,  measure¬ 
ment  complexity  q,  and  noise  resilience  (y,b).  Fix  R  >  0  and  recall  that  T  :=  [m/q\.  With 
probability  at  least  1  —  CT exp(— cq)  over  the  choice  of  A  and  the  randomness  of  To,  the  follow¬ 
ing  holds  for  all  x  E  RBlf  n  all  e  E  Mm  with  || e || ^  <  7/2_T||a3||2,  and  all  f  E  {±l}m  with 
|{*  :  fi  =  —  1} [  <6  in  the  measurement  model 

for  y  £  {±l}m  and  r  =  Q(Ax,  A,  s,  R,  q)  E  Mm,  the  output  x  of  A(y,  A,  s,  R,r,q)  satisfies 

\\x  -  x\\2  <  R2~T .  (12) 


The  positive  constants  y,  b,  c,  and  C  above  are  absolute  constants. 

Our  two  order-one  recovery  schemes  each  have  measurement  complexity  q  =  Cslog(n/s).  This 
implies  the  announced  exponential  decay  in  the  error  rate. 

Corollary  7.  Let  Q,  A  be  as  in  Algorithms  and[#|  with  one-bit  recovery  schemes  (To,  Ao)  given 
either  by  Algorithms  (Aj^)  or  (^6).  Let  A  E  Mmxn  have  independent  standard  normal  entries.  Fix 
R  >  0  and  recall  that  A  =  m/(slog(n/s)).  With  probability  at  least  1  —  C\  exp(— cs  log(n/s))  over 
the  choice  of  A  and  the  randomness  of  To,  the  following  holds  for  all  x  E  RBf  0  £s,  all  e  E  Mm 
with  llelloo  —  7?2— r || m || 2 7  and  all  f  E  {±l}m  with  |{i  :  fi  =  —  1} |  <  b  in  the  measurement  model 
([6])  (b  =  0  if  (To,  Ao)  is  based  on  convex  programming  or  b  =  cslog(n/s )  if  (To,  Ao)  is  based  on 
hard  thresholding) : 

for  y  E  {±l}m  t  =  Q(Ax,  A,  s ,  R,  q )  E  Mm,  the  output  x  of  A (y,  A,  s,  R,  r,  q)  satisfies 

\\x  -  x\\2  <  R2~cX .  (13) 

The  positive  constants  y,  d ,  c,  and  C  above  are  absolute  constants. 


4  Proofs 

4.1  Exponentially  decaying  error  rate  from  order-one  recovery  schemes 

First,  we  prove  Theorem [6] which  states  that,  given  an  appropriate  order-one  recovery  scheme,  the 
recovery  algorithm  A  in  Algorithm  [8]  converges  with  exponentially  small  reconstruction  error  when 
the  measurements  are  obtained  by  the  quantizer  Q  of  Algorithm  [7J 
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Proof  of  Theorem  [d|  For  ®  e  RB2  n  £s,  we  verify  by  induction  on  t  6  {0, 1, . . . ,  T}  that 

\\x  —  xt\\2  <  R2~f . 

This  induction  hypothesis  holds  for  t  =  0.  Now,  suppose  that  it  holds  for  t  —  1,  t  G  {1, . . . ,  T}. 
Consider  Ao(y^\  A^\  Rt  ,  ) ,  the  estimate  returned  by  the  order-one  recovery  scheme  in  ©■ 

By  definition,  the  thresholds  r®  were  obtained  in  step  t  by  running  To  on  iW(x-®(_i).  Similarly, 
the  quantized  measurements  are  formed  by  quantizing  (with  noise)  the  affine  measurements 

A^x  —  a ^  =  A^\x  —  Xt- 1)  —  t^\ 


Thus,  we  have  effectively  run  the  order-one  recovery  scheme  on  the  2s-sparse  vector  x  —  xt-  By  the 
guarantee  of  the  order-one  recovery  algorithm,  with  probability  at  least  1  —  C  exp  (—eg), 


(x  -  xt-i)  -  A0(yW,  A(t\Rt,r{t)) 


<  Rt/ 4  =  R2~t+1  /A. 


Suppose  that  this  occurs.  Let 


2  =  xt-i  +  A  o(i/w,Aw,f?t,r(t)), 


so  II®  —  z||2  <  R2  t+1  /A.  Since  x^  =  Hs(z)  is  the  best  s-term  approximation  to  2,  it  follows  that 

||®  —  ®t||2  =  ||®  —  Hs(z)\\2  <  ||®  —  «||2  +  \\Hs{z)  ~  z\\2  <  2  ||®  —  z||2  <  R2~f . 


Thus,  the  induction  hypothesis  holds  for  t.  A  union  bound  over  the  T  iterations  completes  the 
proof,  since  the  announced  result  is  the  inductive  hypothesis  in  the  case  that  t  =  T.  §5J 


4.2  Hard-thresholding-based  order-one  recovery  scheme 

The  proof  of  Theorem [3] relies  on  three  properties  of  random  matrices  A  6  M9Xn  with  independent 
standard  normal  entries.  In  their  descriptions  below,  the  positive  constants  c,  C,  and  d  are  absolute 
constants. 


The  restricted  isometry  property  of  order  s  ( jFR13l  Theorems  9.6  and  9.27]):  for  any  <5  >  0, 
with  failure  probability  at  most  2  exp(— cd2q),  the  estimates 


(1-5) 


® 


|2  <  —  ||  A® 


|2  <  (1  +  5) 


® 


(14) 


hold  for  all  s-sparse  ®  G  Mn  provided  q  >  C5  2slog(n/s). 

•  The  sign  product  embedding  property  of  order  s  (  .IDDV 13.  IPV13b|h  for  any  5  >  0,  with 
failure  probability  at  most  8exp(— cd2q),  the  estimates 


y^/2 

q 


(Aw,  sign  (A®))  —  ( w ,  ®) 


<  5 


(15) 


hold  for  all  effectively  s-sparse  w,  x  £  Mn  with  ||iw||2  =  ||®||2  =  1  provided  q  >C5  6slog(n/s). 
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The  i\-quotient  property  (|Woj09|  or  [FR13.  Theorem  11.19]):  if  n  >  2q,  then  with  failure 
probability  at  most  exp(— cq),  every  e  6  M9  can  be  written  as 


e  =  Ait 


with  ||ti|| j  <  d^fsf  ||e||2  j \fq  where  s*  := 


log (n/q)' 


(16) 


Combining  the  ^-quotient  property  and  the  restricted  isometry  property  (of  order  2s  for  a 
fixed  d  £  (0, 1/2),  say)  yields  the  simultaneous  (^2^1)- quotient  property  (use,  for  instance,  [FR13, 
Theorem  6.13  and  Lemma  11.16]);  that  is,  there  are  absolute  constants  d,  d!  >  0  such  that  every 
e£t5  can  be  written  as 


e  =  Ait 


with 


Ml2  < 


u 


d\\e\\-2  /  y/di 

l2  : 


lr  <  d'^/~sf  || e || 2  / y/q- 


(17) 


Proof  of  Theorem [3|  We  target  the  inequalities 

7^/2 


x 


X 


-Hs  (A *y) 


<  6  +  C4 


v^lM 


+  C51 


'  dH(y,  sign  (A*  +  e)) 


(18) 


The  desired  inequalities  (10)  then  follows  modulo  a  change  of  constants,  because  H's  (A *y)  is  the 
best  unit-norm  approximation  to  y 1  n/2q~1Hs  (A *y),  so  that 


j^r-H'(A*y) 

< 

*  -HTiHAA.y) 

+ 

H's  (A*j /)  -  ^ Hs  (A *y) 

\m\2 

2 

11*112  d 

2 

q 

<  2 


x 


^J2 


x 


Hs  (A*y) 


With  s*  =  q/\og{n/q)  as  in  (16),  we  remark  that  it  is  enough  to  consider  the  case  s  =  cs*, 
c  :=  cf15‘ .  Indeed,  the  inequality  q  >  c\5~7  -slog(n/s)  yields  q  >  c~1s\og(n/q),  i.e. ,  s  <  cs*.  Then 
(18)  for  s  follows  from  (18)  for  cs*  modulo  a  change  of  constants  because  Hs(A*y)  is  the  best 
s-terrn  approximation  to  HCSt(A*y),  so  that 


x 


\x\ 


Hs  (A *y) 


< 

+ 

^Hs  (A *y) 

^Hcs,  (A*y) 

ll*ll2 

2 

q 

q 

<  2 


x 


\x\ 


Hcs *  (A *y) 


We  now  assume  that  s  =  cs*.  This  reads  q  =  c\5~‘s\og{n/q)  and  arguments  similar  to  jFR13. 
Lemma  C.6(c)]  lead  to  q  >  {c\5~  ‘  /  \og(eci5~7))s\og{n/ s).  Thus,  if  c\  is  chosen  large  enough  at  the 
start,  we  have  q  >  C5~6slog(n/ s).  This  ensures  that  the  sign  product  embedding  property  (15)  of 
order  2s  with  constant  6/2  holds  with  high  probability.  Likewise,  the  restricted  isometry  property 
©  of  order  2s  with  constant  9/16,  say,  holds  with  high  probability.  In  turn,  the  simultaneous 
(£2, ^i)-quotient  property  ©  holds  with  high  probability. 
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We  place  ourselves  in  the  situation  where  all  three  properties  hold  simultaneously,  which  occurs 
with  failure  probability  at  most  C2  exp(  —  csd2q)  for  some  absolute  constants  C2,  C3  >  0.  Then,  writing 
S  =  supp  ( x )  and  T  =  supp  ( Hs  (A *y)),  we  remark  that  Hs  (A *y)  is  the  best  s-terrn  approximation 
to  A*SuTy,  so  that 


*  A'y) 

< 

*  vV2a*  „ 

11  11  AsuTy 

+ 

^Hs(A*y)-^A*SuTy 

11*112  ^ 

2 

II*  2  q 

2 

q  q 

<  2 


x 


V^/2 


x 


A*suTy 


2 

(19) 


We  continue  with  the  fact  that 

V^/2 


x 


X 


q 


~-^-*sutV 


< 

x  Wir  2 

n  I,  AsuTSign  (Ax  +  e) 

,  V^/2 

a*sut  (y  ~  sign  (Ax  +  e)) 

®2  q 

2  q 

(20) 


The  second  term  on  the  right-hand  side  of  (20)  can  be  bounded  with  the  help  of  the  restricted 
isometry  property  ( 14 )  as 

II  A5ut  (V  ~  sign  (A*  +  e))lli  =  (asutasut  (v  -  sign  (Ax  +  e)),y-  sign  (Ax  +  e)) 

<  ||  ASuta*Sut  (V  ~  sign  (Ax  +  e))  || 2  || y-  sign  (Ax  +  e)||2 
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<  ]J1A  ^IIAsut  (y  -  sign  (Ax  +  e))||2  || y  -  sign  (Ax  +  e) 
Simplifying  by  IIasut  (y  —  sign  ( Ax  +  e))||2,  we  obtain 

II  A5ut  (y  —  sign  (Ax  +  e))||2  <  - yjq  ||y  -  sign  (Ax  +  e)||2  =  -  vW dH  {y,  sign( Ax  +  e)).  (21) 


The  hrst  term  on  the  right-hand  side  of  (20)  can  be  bounded  with  the  help  of  the  simultaneous 
(£2,  £i)-quotient  property  ©  and  of  the  sign  product  embedding  property  (fl5|) .  We  start  by 
writing  Ax  +  e  as  A  (x  +  u)  for  some  u  6  as  in  (|17|).  We  then  notice  that 

1 1 x  T  tx 1 1 2  >  ||*||2  —  ||w||2  >  ||*||2  —  d  ||e||2  / \/q  >  (1  —  dee)  ||*||2  , 

||*  -T  IX II 1  <  11*11!  +  1 1 XX  111  <  Vs  ||x||2  +  d! ||e||2  / y/q  <  (^=  +  ^=0  VYs\ 


x 


Hence,  if  c§  is  chosen  small  enough  at  the  start,  then  we  have  ||x  +  xx|| x  <  \/2 s  ||x  +  tx||2,  i-e. ,  x  +  u 
is  effectively  (2s)-sparse.  The  sign  product  embedding  property  ( 15 )  of  order  2s  then  implies  that 


w, 


X  +  u 
\x  +  u\\ 


\A72 


A5ursign  (Ax  +  e) 


w. 


X  +  u 
\x  +  XX 1 1 


V^/2 


(Am,  sign  (A  (x  +  u))) 


5 

<  - 
~  2 
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for  all  unit-normed  w  E  Mn  supported  on  S  U  T.  This  gives 


*  +  u 


^J2 


\x  +  u\ 


A^uTsign  (A*  +  e) 


“  2’ 


and  in  turn 


x 


^J2 


x 


ASUTsign  (A*  +  e) 


4  + 
4- 


X 


X  +  u 


X 


\x\ 


\x  +  U  ||2 
1 

II*  +  It  I 


X 


+ 


6  *  +  u  L  —  \\x  L 

<  -  +  - - „  "2  "2 '  + 


u 


u 

\x  T  lt||2  2 

<5  2  ||ii| 

■^2  + 


2  '  ||*  +  u||2  '  ||*  +  u||2  2  '  ||*  T  tt||2 

From  ||tt||2  <  d  ||e||2  j  yfq  and  ||*  +  u  II 2  —  (1  —  dee )  ||*||2  >  ||*||2  /2  for  cq  is  small  enough,  we  derive 
that 


* 


V4J2 


A5UT  (sign  (A*  +  e)) 


<  -  + 


4 d  || e | 


2  VvMW 


(22) 


l*ll2  Q 

Substituting  ©  and  ([22])  into  ([20])  enables  us  to  derive  the  desired  result  (fl8])  from  ([T9]) .  □ 

The  proof  of  Theorem  [4]  presented  next  follows  from  Theorem  [3j 
Proof  of  Theorem  [^}  For  later  purposes,  we  introduce  the  constant 


C  :  = 


max  |/'(0|  >2,  fiO  :=  1  - 


fgU _ i  A.l 

?tLv/2  20  ’  y/5  ^  20 


£ 


Given  *  E  RB 2  n  Ss,  we  acquire  a  corrupted  version  y\  E  {il}9/2  of  the  quantized  measurements 
sign(Ai*).  Since  the  number  of  rows  of  the  matrix  Ai  E  R(9/2)xn  is  large  enough  for  Theorem [3] 
to  hold  with  c>o  =  <5/(4(l  +  2C))  instead  of  5,  we  obtain 


* 


—  u 


* 


<  60  +  c^cS  +  c5c'd  <  250,  u  \=  H's(A\yi) , 


provided  that  the  constants  c  and  c'  are  small  enough.  With  *^  denoting  the  orthogonal  projection 
of  *  onto  the  line  spanned  by  it,  we  have 


*  —  x* 


<  II*  —  ||* || 2  tt|L  <  25q  ||*| 


We  now  consider  a  unit-norm  vector  v  E  Mn  supported  on  supp(n)  and  orthogonal  to  u.  The 
situation  in  the  plane  spanned  by  u  and  v  is  summarized  in  Figure  [3} 

We  point  out  that  ||*^||  <  ||*||  <  R  gave  ||*^||2  <  2 R,  but  that  2 R  was  just  an  arbitrary 
choice  to  ensure  that  cos($)  stays  away  from  1  -here,  003(6*)  E  [l/\/2,  2/y/b\.  Forming  the  s-sparse 
vector  w  :=  2 R  ■  (it  +  v ),  we  now  acquire  a  corrupted  version  1/2  £  {±1}9//2  of  the  quantized 
measurements  sign(A2(*  —  in))  on  the  2s-sparse  vector  *  —  in.  Since  the  number  of  rows  of  the 


17 


2Rv 


Figure  3:  The  situation  in  the  plane  spanned  by  u  and  v. 

matrix  A2  G  ]R( q/2)xn  is  large  enough  for  Theorem  [3]  to  hold  with  do  =  5/ (4.(1  +  2 C))  instead  of  5 
and  2s  instead  of  s,  we  obtain 


w  —  x 


-  t 


w  —  X 


<  60  +  cac5  +  C5c'6  <  25o,  t  =  —H's(A*2y2). 


We  deduce  that  t  also  approximates  (w  —  x^) /  \\w  —  x^\\2  with 


error 


w  —  x* 
\w  —  x^\\ 

W  — 

< 


w  —  X 


< 


\w  —  X « 


\x  —  X* 


\w  —  x<* 


\w  —  X* 


+ 


\w  —  X* 


\w  —  X 


(w  —  x) 


+ 


w  —  X 


- 1 


\w  —  x\ 


+ 


\w  —  X\\2  —  \\w  —  x»\\0\ 


\w  —  x<* 


+  250  <  2 


\x  -  x*\\  25n  1 1®  I 

1  112  +  2d0  <2-  "  1 


\w  —  x« 


2  R 


+  2<5q 


<  4<50. 


It  follows  that  (t,v)  approximates  ((w  —  x^)/\\w  —  x^\\,v^  =  cos($)  with  error 


|  cos($)  —  (t,  v )  |  = 
We  then  notice  that 


w  —  x" 


\w  —  x» 


—  t,  V 


< 


w  —  X* 


- 1 


\w  —  X* 


|i>||2  <  45q . 


ar 


=  2  R  —  2i2tan(0)  =  2Rf(cos(0)), 


so  that  2 Rf((t,v))  approximates  ||x“||2  with  error 

-2 Rf((t,v))  =  2R\f(cos(9))  -  f((t,v))\  <  2RC\cos(9 )  -  (t,v)  \  <  2RC450  =  8C50R. 


ar 


Here,  we  used  the  facts  that  cos(0)  G  [l/\/2,  2/yfb]  and  that  (t,v)  G  [l/\/2  —  4<5o,  2/\/5  +  4do]  C 
[1/V2  —  1/20,  2/y/h  +  1/20].  We  derive  that 


x 


-2  Rf((t,v)) 


< 

X 

_ 

X $ 

+ 

X ^ 

2 

2 

—  2Rf((t,  v)) 


< 


X  —  X* 


+ 


X* 


-2  Rf((t,v)) 


<  25q  1 1  a?  1 1 2  T  8  C  5qR  <  2(1  +  4C)5qR. 
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Finally,  with  the  estimate  x  for  x  being  defined  as 

x  :=  2 Rf((t,v))u, 

the  previous  considerations  lead  to  the  error  estimate 

||®  —  ®||2  <  ||®  —  ||®||2ii||2  +  |||®||2  —  2  Rf((t,  u))|  ||it||2  <  2do  ||®  ||  2  +  2(1  +  4:C)5oR 
<  4(1  +  2C)60R. 

Our  initial  choice  of  So  =  <5/(4(  1  +  2 C))  enables  us  to  conclude  that  ||®  —  ®||2  <  SR.  □ 

4.3  Second-order-cone-programming-based  order-one  recovery  scheme 

Proof  of  Theorem^  Without  loss  of  generality,  we  assume  that  R  =  1/2.  The  general  argument 
follows  from  a  rescaling.  We  begin  by  considering  the  exact  case  in  which  e  =  0.  Observe  that,  by 
the  Cauchy-Schwarz  inequality, 

IMIi  <  /So  •  ll*l|2  <  v«- 

Since  x  is  feasible  for  program  we  also  have  H®^  <  y/s.  The  result  will  follow  from  the 
following  two  observations: 

•  ®,  ®  €  y/sBf  O  Blf 

•  sign((aj,®)  -  Tf)  =  sign((o*,®)  -  n),  i  =  l,...,q. 

Each  equation  (cq,  z)—Ti  =  0  defines  a  hyperplane  perpendicular  to  a.;  and  translated  proportionally 
to  Tj ;  further,  x  and  x  are  on  the  same  side  of  the  hyperplane.  To  visualize  this,  imagine  y/sBfnB% 
as  an  oddly  shaped  apple  that  we  are  trying  to  dice.  Each  hyperplane  randomly  slices  the  apple, 
eventually  cutting  it  into  small  sections.  The  vectors  x  and  x  belong  to  the  same  section.  Thus,  we 
ask:  how  many  random  slices  are  needed  for  all  sections  to  have  small  diameter?  Similar  questions 
have  been  addressed  in  a  broad  context  in  |PV14| .  We  give  a  self-contained  proof  that  0(s  log (n/s)) 
slices  suffice  based  on  the  following  result  |PV14l  Theorem  3.1]. 

Theorem  8  (Random  hyperplane  tessellations  of  \fsB™  n  Sn_1).  Let  oi,  a^,  ■  ■  ■ ,  aq  E  Mn  be  inde¬ 
pendent  standard  normal  vectors.  If 


q  >  CS  4slog(n/s), 

then,  with  probability  at  least  1  —  2  exp(— c54q),  all  x,  x'  E  y/sB™  n  5n_1  with 

sign  (a^®)  =  sign(a;,®'),  i  =  l,...,q, 


satisfy 


The  positive  constants  c  and  C  are  absolute  constants. 

We  translate  the  above  result  into  a  tessellation  of  \fsB "  n  Blf  in  the  following  corollary. 
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Corollary  9  (Random  hyperplane  tessellations  of  s/sB™  n  B 2).  Let  a\,  a2, . .  • ,  aq  E  Mn  be  in¬ 
dependent  standard  normal  vectors  and  let  n,  t-2,  ■  ■  ■ ,  rq  be  independent  standard  normal  random 
variables.  If 

q  >  C5~4,slog(n/s), 

then,  with  probability  at  least  1  —  2  exp(— cd4q),  all  x ,  x'  E  \fsBf  n  F>2  with 
sign((aj,  x)  -  rf)  =  sign((oj,  x')  -  n),  i  =  l,...,q, 


satisfy 


The  positive  constants  c  and  C  are  absolute  constants. 


Proof.  For  any  z  E  y/sB^^Blf,  we  notice  that  sign((a*,  z)  —  rf)  =  sign(([aj,  —  rf\,  [z,  1]}),  where  the 
augmented  vectors  [a*,  —  rf\  E  Mn+1  and  [z,  1]  E  Mn+1  are  the  concatenations  of  a*  with  —  n  and  z 
with  1,  respectively.  Thus,  we  have  moved  to  the  ditherless  setup  by  only  increasing  the  dimension 
by  one.  Since 

||[z,1]||2>1  and  ||[z,l]||1  =  IM^  +  1  <  yfs  Tl  <  Vis, 
we  may  apply  Theorem  [8]  after  projecting  on  Sn  to  derive 

[*,  !]  [*',  1] 
ll[*,i]||2  II[*m]||2 


with  probability  at  least  1 
||*  —  1 1 2  <  5/4. 

First  note  that 


—  2  exp(c54q).  We  now  show  that  the  inequality  (23)  implies  that 


x  —  x' 


2<v^ 


X 

x' 

IIMIa 

IIIMIIa 

since  ||x||2  <  1.  Subtract  and  add  x' /  HI*/  1]||2  inside  the  norm  and  apply  triangle  inequality  to 
obtain 


Since  ||x/||2  <  1,  we  may  remove  || sc7 1|2  from  in  front  of  the  second  term  in  parenthesis.  Next,  use 
the  inequality  a  +  b  <  \/2  ■  \J a2  +  b2  on  the  two  terms  in  parenthesis.  This  bounds  the  right-hand 
side  by  precisely 


[*»!] 

[*',  1] 

IMIla 

II[*m]||2 

which  is  bounded  by  5/4  according  to  (23). 


□ 


This  corollary  immediately  completes  the  proof  of  Theorem  [2]  in  the  case  e  =  0.  We  now 
turn  to  the  general  problem  where  1 1 e || ^  <  c53  and  thus  ||e||2  <  c53v/g.  We  reduce  to  the  exact 
problem  using  the  simultaneous  {l\,  ^-quotient  property  which  guarantees  that  the  error  can 
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be  represented  by  a  signal  with  small  £i-norm.  In  particular,  ©  implies  that,  with  probability  at 
least  1  —  exp  (—eg),  there  exists  a  vector  u  satisfying 


e  =  Am 


JIM|2  <  <V4, 

\IMIi.  <  Cid^y/q/  log  (n/q) 


(24) 


where  c±  is  an  absolute  constant  which  we  may  choose  as  small  as  we  need.  We  may  now  replace  x 
with  x  =  x  +  u  and  proceed  as  in  the  proof  in  the  noiseless  case.  Reconstruction  of  x  to  accuracy 
5/ 4  yields  reconstruction  of  x  to  accuracy  5/2,  as  desired.  By  replacing  x  with  x,  we  have  (mildly) 
increased  the  bound  on  the  td-norm  and  the  l?2-norm.  Fortunately,  ||x||2  <  11*112  +  ||ti||2  <  1  and 
thus  x  remains  feasible  for  the  program  _S_  Further,  x  is  approximately  sparse  in  the  sense  that 
||*|li  <  ||*|li  +  ll^lli  <  \/s  +  c\53^/q/  log (n/q)  =:  y/§.  To  conclude  the  proof,  we  must  show  that 
the  requirement  of  Theorem  [2J  namely  q  >  C'5~^slog(n/s),  implies  that  the  required  condition  of 
Corollary  [9J  namely  q  >  C5~is\og{n/s),  is  still  satisfied.  The  result  follows  from  massaging  the 
equations,  as  sketched  below. 

If  s  >  c\5&q/  log(n/g),  then  y/1  <  2^s  and  the  desired  result  follows  quickly.  Suppose  then 
that  s  <  c\5& q /  \og{n / q)  and  thus  s  <  c25®q/ \og(n/ q).  To  conclude,  note  that 
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C5~As\og{n/s)  <q-C  ■  c2- — i-j-r  ■  (log  (n/q)  +  log(l/c2)  +  61og(l/£)  +  log(log(n/g))  <  q, 

log  [n/q) 

where  the  hrst  inequality  follows  since  slog(n/s)  is  increasing  in  s  and  thus  s  may  be  replaced 
by  its  upper  bound,  c25eq/  log(n/g).  The  last  inequality  follows  by  taking  c2  small  enough.  This 
concludes  the  proof.  □ 


5  Numerical  Results 

This  brief  section  provides  several  experimental  validations  of  the  theory  developed  above.  The 
computations,  performed  in  MATLAB,  are  reproducible  and  can  be  downloaded  from  the  second 
author’s  webpage.  The  random  measurements  eq  were  always  generated  as  vectors  with  independent 
standard  normal  entries.  As  for  the  random  sparse  vectors  x,  after  a  random  choice  of  their 
supports,  their  nonzero  entries  also  consisted  of  independent  standard  normal  variables. 

Our  first  experiment  (results  not  displayed  here)  verified  on  a  single  sparse  vector  that  both  its 
direction  and  magnitude  can  be  accurately  estimated  via  order-one  recovery  schemes,  while  only 
its  direction  could  be  accurately  estimated  using  convex  programs  |PV13a;  lPV13bj.  ^i-regularized 
logistic  regression,  or  binary  iterative  hard  thresholding  jJLBB13],  We  also  noted  the  reduction 
of  the  reconstruction  error  by  several  orders  of  magnitude  from  the  same  number  m  of  quantized 
measurements  when  Algorithms  [7]{8]  are  used  instead  of  the  above  methods.  We  remark  in  passing 
that  this  number  m  is  significantly  larger  than  the  number  of  measurements  in  classical  compressed 
sensing  with  real-valued  measurements,  as  intuitively  expected. 

Our  second  experiment  corroborates  the  exponential  decay  of  the  error  rate.  The  results  are 
summarized  in  Figure |4j  whose  logarithmic  scale  on  the  vertical  axis  confirms  the  behavior  log(||x  — 
**112/11*112)  <  —  cA  for  the  relative  reconstruction  error  as  a  function  of  the  oversampling  factor 
A  =  m/\og(n/s).  The  tests  were  conducted  on  four  sparsity  levels  s  at  a  fixed  dimension  n  for 
an  oversampling  ratio  A  varying  through  the  increase  of  the  number  m  of  measurements.  The 
number  T  of  iterations  in  Algorithms  [7]  and  [8]  was  fixed  throughout  the  experiment  based  on  hard 
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thresholding  and  throughout  the  experiment  based  on  second-order  cone  programming.  The  values 
of  all  these  parameters  are  reported  directly  in  Figure  [4j  We  point  out  that  we  could  carry  out 
a  more  exhaustive  experiment  for  the  faster  hard-thresholding-based  version  than  for  the  slower 
second-order-cone-programming-based  version,  both  in  terms  of  problem  scale  and  of  number  of 
tests. 


(a) 


Figure  4:  Averaged  relative  error  for  the  reconstruction  of  sparse  vectors  (n  =  100)  by  the  outputs 
of  Algorithms  [7]{8]  based  on  (a)  hard  thresholding  and  (b)  second-order  cone  programming  as  a 
function  of  the  oversampling  ratio. 


Our  third  experiment  examines  the  effect  of  measurement  errors  on  the  reconstruction  via  Algo¬ 
rithms  [7]  and  [8}  Once  again,  the  problem  scale  was  much  larger  when  relying  on  hard  thresholding 
than  on  second-order  cone  programming.  The  values  of  the  size  parameters  are  reported  on  Fig¬ 
ure  [5]  This  figure  shows  how  the  reconstruction  error  decreases  as  the  iteration  count  t  increases 
in  Algorithms  [7]  and  [8|  For  the  hard-thresholding-based  version,  see  Figure  [5])a),  we  observe  an 
error  decreasing  by  a  constant  factor  at  each  iteration  when  the  measurements  are  totally  accu¬ 
rate.  Introducing  a  pre-quantization  noise  e  ~  IV(0,  cr2I)  in  y  =  sign(Acc  +  e)  does  not  affect 
this  behavior  too  much  until  the  “noise  floor”  is  reached.  Flipping  a  small  fraction  of  the  bits 
sign  (ctj,  x)  by  multiplying  them  with  /,;  =  ±1,  most  of  which  being  equal  to  +1,  seems  to  have  an 
even  smaller  effect  on  the  reconstruction.  However,  these  bit  flips  prevent  the  use  of  the  second- 
order-cone-programming-based  version,  as  the  constraints  of  the  optimization  problems  become 
infeasible.  But  we  still  remark  that  the  pre-quantization  noise  is  not  very  damaging  in  this  case 
either,  see  Figure  (5^b),  where  the  results  of  an  experiment  using  I'l-regularized  logistic  regression 
in  Algorithms  [7]  and  [8]  are  also  displayed. 


6  Discussion 

6.1  Related  work 

The  one-bit  compressed  sensing  framework  developed  by  Boufounos  and  Baraniuk  |BB08]  is  a 
relatively  new  line  of  work,  with  theoretical  backing  only  recently  being  developed.  Empirical  ev¬ 
idence  and  convergence  analysis  of  algorithms  for  quantized  measurements  appear  in  the  works 
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Figure  5:  Averaged  relative  error  for  the  reconstruction  of  sparse  vectors  (n  =  100)  by  the  outputs 
of  Algorithms  [t]{8]  based  on  (a)  hard  thresholding  (s  =  15,  m  =  105)  and  second-order  cone 
programming  and  (b)  td-regularized  logistic  regression  (s  =  10,  m  =  2  ■  104)  as  a  function  of  the 
iteration  count  when  measurement  error  is  present. 


of  Boufounos  et  al.  and  others  |Bou09l  IBB081  ILWYBlll  IZBClOj .  Theoretical  bounds  on  recov¬ 
ery  error  have  only  recently  been  studied,  outside  from  results  which  model  the  one-bit  setting  as 
classical  compressed  sensing  with  specialized  additive  measurement  error  |DPM09l  Ml  IF  I  II  1SG09|. 
Other  settings  analyze  quantized  measurements  where  the  number  of  bits  used  depends  on  signal 
parameters  like  sparsity  level  or  the  dynamic  range  jACS09l  lGLP+10~l  lGLP+13] .  Boufounos  devel¬ 
ops  hierarchical  and  scalar  quantization  with  modified  quantization  regions  which  aim  to  balance 
the  rate-distortion  trade-off  [Boulll  IBoul2] .  These  results  motivate  our  work  but  do  not  directly 
apply  to  the  compressed  sensing  setting. 

Theoretical  guarantees  more  in  line  with  the  objectives  of  this  paper  began  with  Jacques 
et  al.  |JLBB13]  who  proved  robust  recovery  from  approximately  slogn  one-bit  measurements. 
However,  the  program  used  has  constraints  which  require  sparsity  estimation,  making  it  NP- 
Hard  in  general.  Gupta  et  al.  offers  a  computationally  feasible  method  via  a  scheme  which 
either  depends  on  the  dynamic  range  of  the  signal  or  is  adaptive  [GNR10],  Plan  and  Ver- 
shynin  analyze  a  tractable  non-adaptive  convex  program  which  provides  accurate  recovery  with¬ 
out  these  types  of  dependencies  jPV13aI  lPV13bl.  IALPV14],  Other  methods  have  also  been 
proposed,  many  of  which  are  largely  motivated  by  classical  compressed  sensing  methods  (see 
e.g.  [BonlM  lMFm2l  lYYOl  21 IMBN1 31 1.TDDV1 3)). 

In  order  to  break  the  bound  ([3])  and  obtain  an  exponential  rather  than  polynomial  dependence  on 
the  oversampling  factor,  one  cannot  take  traditional  non-adaptive  measurements.  Several  schemes 
have  employed  adaptive  samples  including  the  work  of  Kamilov  et.  al.  which  utilizes  a  generalized 
approximate  message  passing  algorithm  (GAMP)  for  recovery,  and  the  adaptive  thresholds  are 
selected  in  line  with  this  recovery  method.  Adaptivity  is  also  considered  in  (GNR10|  which  allows 
for  a  constant  factor  improvement  in  the  number  of  measurements  required.  However,  to  our  best 
knowledge  our  work  is  the  first  to  break  the  bound  given  by  ([3]). 

Regarding  the  link  between  our  methods  and  sparse  binary  regression,  there  is  a  number  of 
related  theoretical  results  focusing  on  sparse  logistic  regression  NRWY121  IBun08(  IVDG081  IBaclOl 
1RWL101 1MVDGB081 IKSSTIO] ,  but  these  are  necessarily  constrained  by  the  same  limited  accuracy 
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of  the  one-bit  compressed  sensing  model  discussed  in  Section  [T] 

We  also  point  to  the  closely  related  threshold  group  testing  literature,  see  e.g.,  |Chel3] .  In 
many  cases,  the  statistician  has  some  control  over  the  threshold  beyond  which  the  measurement 
maps  to  a  one.  For  example,  the  wording  of  a  binary  survey  may  be  adjusted  to  only  ask  for  a 
positive  answer  in  an  extreme  case;  a  study  of  the  relationship  of  heart  attacks  to  various  factors 
may  test  whether  certain  subjects  have  heart  attacks  in  a  short  window  of  time  and  other  subjects 
have  heart  attacks  in  a  long  window  of  time.  The  main  message  of  this  paper  is  that  by  carefully 
choosing  this  threshold  the  accuracy  of  reconstruction  of  the  parameter  vector  x  can  be  greatly 
increased. 

6.2  Conclusions 

We  have  proposed  a  recursive  framework  for  adaptive  thresholding  quantization  in  the  setting  of 
compressed  sensing.  We  have  developed  both  a  second-order-cone-programming-based  method  and 
a  hard-thresholding-based  method  for  signal  recovery  from  these  type  of  quantized  measurements. 
Both  of  our  methods  feature  a  bound  on  the  recovery  error  of  the  form  an  exponential 

dependence  on  the  oversampling  factor  A.  To  our  best  knowledge,  this  is  the  first  result  of  this 
kind,  and  it  improves  upon  the  best  possible  dependence  of  f2(l/A)  for  non-adaptively  quantized 
measurements. 
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